
<!DOCTYPE html>
<!--[if IEMobile 7 ]><html class="no-js iem7"><![endif]-->
<!--[if lt IE 9]><html class="no-js lte-ie8"><![endif]-->
<!--[if (gt IE 8)|(gt IEMobile 7)|!(IEMobile)|!(IE)]><!--><html class="no-js" lang="en"><!--<![endif]-->
<head>
  <meta charset="utf-8">
  <title>String与Copy On Write - Yebangyu's Blog</title>
  <meta name="author" content="Yebangyu">

  
  <meta name="description" content="介绍了STL中的string实现以及copy on write技术，并讲解了soupen中string的优化和实现">
  <meta name="keywords" content="string copy on write stl soupen">

  <!-- http://t.co/dKP3o1e -->
  <meta name="HandheldFriendly" content="True">
  <meta name="MobileOptimized" content="320">
  <meta name="viewport" content="width=device-width, initial-scale=1">

  
  <link rel="canonical" href="http://www.yebangyu.org/blog/2016/06/04/copy-on-write-in-string-and-soupen/">
  <link href="/favicon.png" rel="icon">
  <link href="/stylesheets/screen.css" media="screen, projection" rel="stylesheet" type="text/css">
  <link href="/atom.xml" rel="alternate" title="Yebangyu's Blog" type="application/atom+xml">
  <script src="/javascripts/modernizr-2.0.js"></script>
  <script src="//ajax.lug.ustc.edu.cn/ajax/libs/jquery/1.9.1/jquery.min.js"></script>
  <script>!window.jQuery && document.write(unescape('%3Cscript src="/javascripts/libs/jquery.min.js"%3E%3C/script%3E'))</script>
  <script src="/javascripts/octopress.js" type="text/javascript"></script>
  <!--Fonts from Google"s Web font directory at http://google.com/webfonts -->
<link href="//fonts.lug.ustc.edu.cn/css?family=PT+Serif:regular,italic,bold,bolditalic" rel="stylesheet" type="text/css">
<link href="//fonts.lug.ustc.edu.cn/css?family=PT+Sans:regular,italic,bold,bolditalic" rel="stylesheet" type="text/css">
<!-- mathjax config similar to math.stackexchange -->
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
  jax: ["input/TeX", "output/HTML-CSS"],
  tex2jax: {
    inlineMath: [ ['$', '$'] ],
    displayMath: [ ['$$', '$$']],
    processEscapes: true,
    skipTags: ['script', 'noscript', 'style', 'textarea', 'pre', 'code']
  },
  messageStyle: "none",
  "HTML-CSS": { preferredFont: "TeX", availableFonts: ["STIX","TeX"] }
});
</script>
<script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML" type="text/javascript"></script>

  

</head>

<body   >
  <header role="banner"><hgroup>
  <h1><a href="/">Yebangyu's Blog</a></h1>
  
    <h2>Fond of Concurrency Programming and Machine Learning</h2>
  
</hgroup>

</header>
  <nav role="navigation"><ul class="subscription" data-subscription="rss">
  <li><a href="/atom.xml" rel="subscribe-rss" title="subscribe via RSS">RSS</a></li>
  
</ul>
  
<form action="https://www.google.com/search" method="get">
  <fieldset role="search">
    <input type="hidden" name="sitesearch" value="www.yebangyu.org">
    <input class="search" type="text" name="q" results="0" placeholder="Search"/>
  </fieldset>
</form>
  
<ul class="main-navigation">
  <li><a href="/">Blog</a></li>
  <li><a href="/blog/archives">Archives</a></li>
  <li><a href="/about">About Me</a></li>
</ul>

</nav>
  <div id="main">
    <div id="content">
      <div>
<article class="hentry" role="article">
  
  <header>
    
      <h1 class="entry-title">String与Copy On Write</h1>
    
    
      <p class="meta">
        




<time class='entry-date' datetime='2016-06-04T18:13:27+08:00'><span class='date'><span class='date-month'>Jun</span> <span class='date-day'>4</span><span class='date-suffix'>th</span>, <span class='date-year'>2016</span></span> <span class='time'>6:13 pm</span></time>
        
      </p>
    
  </header>


<div class="entry-content"><h2 id="copy-on-write">什么是Copy On Write</h2>

<p>Copy On Write(COW)作为一种优化技术被广泛使用，在string的实现中也不例外。考虑如下的代码：</p>
<div class="bogus-wrapper"><notextile><figure class="code"><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
<span class="line-number">2</span>
<span class="line-number">3</span>
</pre></td><td class="code"><pre><code class="c++"><span class="line"><span class="n">string</span> <span class="nf">s</span><span class="p">(</span><span class="s">&quot;1234&quot;</span><span class="p">);</span>
</span><span class="line"><span class="n">string</span> <span class="n">t</span> <span class="o">=</span> <span class="n">s</span><span class="p">;</span>
</span><span class="line"><span class="n">cout</span><span class="o">&lt;&lt;</span><span class="n">t</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">&lt;&lt;</span><span class="n">endl</span><span class="p">;</span>
</span></code></pre></td></tr></table></div></figure></notextile></div>
<p>第二行，通过调用copy constructor（注意，这里调用的是copy constructor，而不是copy assignment，因为它是从无到有构造对象，而不是设置已有对象）构造对象t，第三行对t中的某个元素进行只读操作。</p>

<p>如果让你实现copy constructor，你会怎么做呢？教科书里的、简单的实现大概是这样：</p>

<!--more-->

<div class="bogus-wrapper"><notextile><figure class="code"><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
<span class="line-number">2</span>
<span class="line-number">3</span>
<span class="line-number">4</span>
<span class="line-number">5</span>
<span class="line-number">6</span>
<span class="line-number">7</span>
<span class="line-number">8</span>
<span class="line-number">9</span>
<span class="line-number">10</span>
</pre></td><td class="code"><pre><code class="c++"><span class="line"><span class="k">class</span> <span class="nc">string</span>
</span><span class="line"><span class="p">{</span>
</span><span class="line">  <span class="k">private</span><span class="o">:</span>
</span><span class="line">    <span class="kt">char</span> <span class="o">*</span><span class="n">data_</span><span class="p">;</span>
</span><span class="line"><span class="p">};</span>
</span><span class="line"><span class="n">string</span><span class="p">(</span><span class="k">const</span> <span class="n">string</span> <span class="o">&amp;</span><span class="n">other</span><span class="p">)</span>
</span><span class="line"><span class="p">{</span>
</span><span class="line">  <span class="n">data_</span> <span class="o">=</span> <span class="k">new</span> <span class="kt">char</span><span class="p">[</span><span class="n">strlen</span><span class="p">(</span><span class="n">other</span><span class="p">.</span><span class="n">data_</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="cm">/*加1是考虑\0*/</span><span class="p">];</span>
</span><span class="line">  <span class="n">strcpy</span><span class="p">(</span><span class="n">data_</span><span class="p">,</span> <span class="n">other</span><span class="p">.</span><span class="n">data_</span><span class="p">);</span>
</span><span class="line"><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure></notextile></div>

<p>好了，既然只是对t进行只读，那么就没完全必要分配内存、拷贝字符串等昂贵操作了，而是复用s的字符串空间即可。这就是我们要谈到的string中的COW。</p>

<h2 id="copy-on-write-1">如何实现Copy On Write</h2>

<p>显然，现在是有多个对象绑定或者说涉及或者说引用到某个字符串缓冲区上了。事实上，正如你能想到的，引用计数（Reference Counting，RC）是实现COW的重要基础。当执行上面最上面代码的第二行时，并不为t分配内存、拷贝字符串，而是相应的字符串的引用计数加1即可。</p>

<p>那么，引用计数应该放在哪里呢？它是string类的普通成员么？不行，这样子每个string对象都有一个引用计数了；成为string类的静态成员变量么？也不行，这样子每个类都只有一个引用计数了。</p>

<p>实际上，我们应该把buffer和引用计数抽离出来，单独放到某处。</p>

<div class="bogus-wrapper"><notextile><figure class="code"><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
<span class="line-number">2</span>
<span class="line-number">3</span>
<span class="line-number">4</span>
<span class="line-number">5</span>
<span class="line-number">6</span>
<span class="line-number">7</span>
<span class="line-number">8</span>
<span class="line-number">9</span>
<span class="line-number">10</span>
<span class="line-number">11</span>
<span class="line-number">12</span>
<span class="line-number">13</span>
<span class="line-number">14</span>
<span class="line-number">15</span>
</pre></td><td class="code"><pre><code class="c++"><span class="line"><span class="k">struct</span> <span class="n">stringvalue</span>
</span><span class="line"><span class="p">{</span>
</span><span class="line">    <span class="kt">char</span> <span class="o">*</span><span class="n">data_</span><span class="p">;</span>
</span><span class="line">    <span class="kt">int</span> <span class="n">rc_</span><span class="p">;</span>
</span><span class="line"><span class="p">}</span>
</span><span class="line"><span class="k">class</span> <span class="nc">string</span>
</span><span class="line"><span class="p">{</span>
</span><span class="line">  <span class="k">private</span><span class="o">:</span>
</span><span class="line">    <span class="n">stringvalue</span> <span class="o">*</span><span class="n">sv_</span><span class="p">;</span>
</span><span class="line"><span class="p">};</span>
</span><span class="line"><span class="n">string</span><span class="p">(</span><span class="k">const</span> <span class="n">string</span> <span class="o">&amp;</span><span class="n">other</span><span class="p">)</span>
</span><span class="line"><span class="p">{</span>
</span><span class="line">  <span class="n">sv_</span> <span class="o">=</span> <span class="n">other</span><span class="p">.</span><span class="n">sv_</span><span class="p">;</span>
</span><span class="line">  <span class="p">(</span><span class="n">other</span><span class="p">.</span><span class="n">sv_</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">rc_</span><span class="o">++</span><span class="p">;</span>
</span><span class="line"><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure></notextile></div>
<p>就这么简单，仅仅是attach，仅仅是增加引用计数。没有memory allocation，没有strcpy或者memcpy了。</p>

<p>如果仅仅是只读操作，那么这没任何问题。但是如果要写了呢？这时候没办法了，必须复制了。这就是所谓lazy evaluation，拖延战术(对应的是eager evaluation)。</p>

<div class="bogus-wrapper"><notextile><figure class="code"><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
<span class="line-number">2</span>
<span class="line-number">3</span>
<span class="line-number">4</span>
<span class="line-number">5</span>
<span class="line-number">6</span>
<span class="line-number">7</span>
<span class="line-number">8</span>
<span class="line-number">9</span>
<span class="line-number">10</span>
<span class="line-number">11</span>
<span class="line-number">12</span>
<span class="line-number">13</span>
<span class="line-number">14</span>
</pre></td><td class="code"><pre><code class="c++"><span class="line"><span class="kt">void</span> <span class="nf">set_char</span><span class="p">(</span><span class="k">const</span> <span class="kt">int</span> <span class="n">idx</span><span class="p">,</span> <span class="kt">char</span> <span class="n">new_char</span><span class="p">)</span>
</span><span class="line"><span class="p">{</span>
</span><span class="line">  <span class="k">if</span> <span class="p">(</span><span class="n">sv_</span><span class="o">-&gt;</span><span class="n">rc_</span> <span class="o">&gt;</span> <span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
</span><span class="line">    <span class="c1">//引用计数至少是2。说明除了自己，还有别人也引用到了这个buffer，不得不分配内存了，否则影响到别人了。</span>
</span><span class="line">    <span class="n">stringvalue</span> <span class="o">*</span><span class="n">tmp</span> <span class="o">=</span> <span class="n">sv_</span><span class="p">;</span>
</span><span class="line">    <span class="n">sv_</span> <span class="o">=</span> <span class="k">new</span> <span class="n">stringvalue</span><span class="p">();</span><span class="c1">//重新搞一份</span>
</span><span class="line">    <span class="n">sv_</span><span class="o">-&gt;</span><span class="n">data_</span> <span class="o">=</span> <span class="k">new</span> <span class="kt">char</span><span class="p">[</span><span class="n">strlen</span><span class="p">(</span><span class="n">tmp</span><span class="o">-&gt;</span><span class="n">char_</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">];</span>
</span><span class="line">    <span class="n">strcpy</span><span class="p">(</span><span class="n">sv_</span><span class="o">-&gt;</span><span class="n">data_</span><span class="p">,</span> <span class="n">tmp</span><span class="o">-&gt;</span><span class="n">char_</span><span class="p">);</span>
</span><span class="line">    <span class="n">sv_</span><span class="o">-&gt;</span><span class="n">rc_</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
</span><span class="line">    <span class="o">--</span><span class="n">tmp</span><span class="o">-&gt;</span><span class="n">rc_</span><span class="p">;</span><span class="c1">//把原来的引用计数减1</span>
</span><span class="line">    <span class="c1">//......</span>
</span><span class="line"><span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
</span><span class="line">  <span class="c1">//......</span>
</span><span class="line"><span class="p">}</span>
</span></code></pre></td></tr></table></div></figure></notextile></div>

<h2 id="copy-on-write-">Copy On Write 会带来什么问题？</h2>

<p>在单线程环境下，使用Copy On Write不容易出问题。不过多线程下，就险象环生了。</p>

<p>如果string内部做同步，那么这无疑增加了string的实现复杂度，并且STL的初衷其实是只考虑单线程环境的。同步操作无疑会带来大量的开销。</p>

<p>如果string内部不做同步，那么问题就来了。</p>

<p>加锁啊，加锁不就完了么？没那么容易。你看：</p>

<div class="bogus-wrapper"><notextile><figure class="code"><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class="line-number">1</span>
<span class="line-number">2</span>
<span class="line-number">3</span>
<span class="line-number">4</span>
<span class="line-number">5</span>
<span class="line-number">6</span>
<span class="line-number">7</span>
</pre></td><td class="code"><pre><code class="c++"><span class="line"> <span class="n">string</span> <span class="n">s</span><span class="p">;</span><span class="c1">//临界资源，全局变量</span>
</span><span class="line"> <span class="kt">void</span> <span class="nf">f</span><span class="p">()</span>
</span><span class="line"> <span class="p">{</span>
</span><span class="line">   <span class="n">lock</span><span class="p">.</span><span class="n">lock</span><span class="p">();</span>
</span><span class="line">   <span class="n">string</span> <span class="n">t</span> <span class="o">=</span> <span class="n">s</span><span class="p">;</span>
</span><span class="line">   <span class="n">lock</span><span class="p">.</span><span class="n">unlock</span><span class="p">();</span>
</span><span class="line">   <span class="c1">//这里可能读写操作t</span>
</span></code></pre></td></tr></table></div></figure></notextile></div>

<p>那么，对t的操作也得加锁，如果有人通过t又复制了一个对象w，那么操作w的时候也得加锁，这太难了。这里，本质困难在于，可能会有很多对象，我们无法控制的对象都绑定到一块buffer上。</p>

<p>不过这样也没什么，毕竟不管有没有使用COW，使用STL时候都不应该对它有多线程安全的幻想。但是使用COW后确实更加危险，因为加锁的难度变大了，甚至都不知道在哪里加了。</p>

<p>想想看，在多线程环境下，假如string使用了COW优化，将c_str()返回的指针强制去掉const，以及operator []的读写可能会有什么问题。</p>

<h2 id="copy-on--write-">使用Copy On  Write 么？</h2>

<p>假如你是库设计和实现者，你会用COW么？</p>

<p><a href="https://github.com/yebangyu/Soupen">Soupen</a>是一款高性能的nosql数据库，旨在能在某些方面替代Redis。它由不著名码农、秦汉史历史学家、本站站长Yebangyu同学在业余时间独立开发完成。</p>

<p>Soupen也实现了自己的string，并不使用Copy On Write(lazy evaluation)。而是使用eager evaluation，立即分配内存，立即拷贝。</p>

<p>在我们的应用中，调用copy constructor后对它进行写（而不是只读）的概率很大，拉拉扯扯各种引用计数加加减减，还不如直接分配+拷贝，反正后面也要做这个工作，何必自寻烦恼。</p>

<p>而且Soupen是服务器，不是类库，使用者是自己而不是广大用户，不需要考虑各种各样的情况。</p>

<p>当然，对于短字符串Soupen是做了优化的，可以参考<a href="http://www.yebangyu.org/blog/2016/05/15/stringinyedis/">这里</a>的源码解读。</p>
</div>


  <footer>
    <p class="meta">
      
  

<span class="byline author vcard">Posted by <span class="fn">Yebangyu</span></span>

      




<time class='entry-date' datetime='2016-06-04T18:13:27+08:00'><span class='date'><span class='date-month'>Jun</span> <span class='date-day'>4</span><span class='date-suffix'>th</span>, <span class='date-year'>2016</span></span> <span class='time'>6:13 pm</span></time>
      

<span class="categories">
  
    <a class='category' href='/blog/categories/c-plus-plus/'>c++</a>
  
</span>


    </p>
    
      <div class="sharing">
  
  
  
</div>

    
    <p class="meta">
      
        <a class="basic-alignment left" href="/blog/2016/05/26/ticketlock/" title="Previous Post: Ticket Lock">&laquo; Ticket Lock</a>
      
      
        <a class="basic-alignment right" href="/blog/2016/06/26/sequence-lock/" title="Next Post: Sequence Lock">Sequence Lock &raquo;</a>
      
    </p>
  </footer>
</article>


</div>

<aside class="sidebar">
  
    <section>
  <h1>Recent Posts</h1>
  <ul id="recent_posts">
    
      <li class="post">
        <a href="/blog/2017/03/11/2017/">2017技术成长之路</a>
      </li>
    
      <li class="post">
        <a href="/blog/2017/02/17/virtualfunctionandvariadicparametertemplate/">虚函数和变长参数模板的妙用</a>
      </li>
    
      <li class="post">
        <a href="/blog/2016/12/25/singleton/">Singleton与多线程</a>
      </li>
    
      <li class="post">
        <a href="/blog/2016/12/04/introductiontohazardpointer/">Lock Free中的Hazard Pointer(下)</a>
      </li>
    
      <li class="post">
        <a href="/blog/2016/12/03/gccandperfopt/">性能优化的那些传说和迷思</a>
      </li>
    
  </ul>
</section>
<section>
  <h1>Friends' Link</h1>
  <ul>
    <li>
	  <li><a href="http://www.chongh.wiki/">Diting0x</a></li>
	  <li><a href="http://www.xiaolili.net/">wangli</a></li>
	  <li><a href="http://www.skykewei.top">dukewei</a></li>
	  <li><a href="http://irwenqiang.github.io">chenwenqiang</a></li>
      <li><a href="http://www.armsword.com">duruofei</a></li>
    </li>
  </ul>
</section><section>
  <h1>Yebangyu</h1>
  <p>福建人。热爱历史、K歌、NBA</p>
  <p>帝都码农</p>
  <p>历史学家，专治秦汉史</p>
</section>
<section>
 <h1>Categories</h1>
 <ul id="categories">
  <li class='category'><a href='/blog/categories/c-plus-plus/'>c++ (11)</a></li>
<li class='category'><a href='/blog/categories/soupen/'>soupen (4)</a></li>
<li class='category'><a href='/blog/categories/web/'>web (1)</a></li>
<li class='category'><a href='/blog/categories/qi-ta/'>其他 (5)</a></li>
<li class='category'><a href='/blog/categories/li-shi/'>历史 (2)</a></li>
<li class='category'><a href='/blog/categories/bing-xing-bian-cheng/'>并行编程 (20)</a></li>
<li class='category'><a href='/blog/categories/xing-neng-you-hua/'>性能优化 (2)</a></li>
<li class='category'><a href='/blog/categories/suan-fa/'>算法 (4)</a></li>
<li class='category'><a href='/blog/categories/bian-yi-lian-jie/'>编译链接 (2)</a></li>

 </ul>
</section>




  
</aside>


    </div>
  </div>
  <footer role="contentinfo"><!-- mathjax config similar to math.stackexchange -->
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
  jax: ["input/TeX", "output/HTML-CSS"],
  tex2jax: {
    inlineMath: [ ['$', '$'] ],
    displayMath: [ ['$$', '$$']],
    processEscapes: true,
    skipTags: ['script', 'noscript', 'style', 'textarea', 'pre', 'code']
  },
  messageStyle: "none",
  "HTML-CSS": { preferredFont: "TeX", availableFonts: ["STIX","TeX"] }
});
</script>
<script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML" type="text/javascript"></script>
<p>
  Copyright &copy; 2017 - Yebangyu -
  <span class="credit">Powered by <a href="http://octopress.org">Octopress</a></span>
</p>
<script type="text/javascript">var cnzz_protocol = (("https:" == document.location.protocol) ? " https://" : " http://");document.write(unescape("%3Cspan id='cnzz_stat_icon_1257548193'%3E%3C/span%3E%3Cscript src='" + cnzz_protocol + "s11.cnzz.com/z_stat.php%3Fid%3D1257548193' type='text/javascript'%3E%3C/script%3E"));</script>

</footer>
  











</body>
</html>
